Skip to content

Conversation

@glide-the
Copy link
Contributor

What does this PR do?

Added CogVideox's Advanced inference and model introduction

@sayakpaul

@sayakpaul
Copy link
Member

@stevhliu @a-r-r-o-w

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@a-r-r-o-w a-r-r-o-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to have this! Redirecting to @stevhliu for a deeper review.

Instead of uploading the gif/png here, could you open a PR to https://huggingface.co/datasets/huggingface/documentation-images/tree/main/diffusers, which I will merge so we can link it here. We don't keep images/videos in this repository otherwise it can get quite bulky to clone

@a-r-r-o-w a-r-r-o-w requested a review from stevhliu October 5, 2024 20:14
@glide-the
Copy link
Contributor Author

Nice to have this! Redirecting to @stevhliu for a deeper review.

Instead of uploading the gif/png here, could you open a PR to https://huggingface.co/datasets/huggingface/documentation-images/tree/main/diffusers, which I will merge so we can link it here. We don't keep images/videos in this repository otherwise it can get quite bulky to clone

image move in https://huggingface.co/datasets/huggingface/documentation-images/discussions/371

Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super cool!! I did an initial pass over the docs and will follow up with a more in-depth look soon 🙂

specific language governing permissions and limitations under the License.
-->
# CogVideoX
CogVideoX is an open-source version of the video generation model originating from QingYing. The table below displays the list of video generation models we currently offer, along with their foundational information.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to briefly describe the technical aspects of CogVideoX so users have a better idea of how it works and what makes it different from other models (check out the Stable Diffusion XL doc as an example).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like (feel free to copy/reuse in the training doc as well):

CogVideoX is a text-to-video generation model focused on creating more coherent videos aligned with a prompt. It achieves this using several methods.

  • a 3D variational autoencoder that compresses videos spatially and temporally, improving compression rate and video accuracy.

  • an expert transformer block to help align text and video, and a 3D full attention module for capturing and creating spatially and temporally accurate videos.

> [!TIP]
> You can pass `--use_8bit_adam` to reduce the memory requirements of training.
> [!IMPORTANT]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also just be plain text rather than a callout.

glide-the and others added 19 commits October 11, 2024 21:59
Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks so much for iterating! Just a few more comments and then we can merge 🙂

specific language governing permissions and limitations under the License.
-->
# CogVideoX
CogVideoX is an open-source version of the video generation model originating from QingYing. The table below displays the list of video generation models we currently offer, along with their foundational information.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like (feel free to copy/reuse in the training doc as well):

CogVideoX is a text-to-video generation model focused on creating more coherent videos aligned with a prompt. It achieves this using several methods.

  • a 3D variational autoencoder that compresses videos spatially and temporally, improving compression rate and video accuracy.

  • an expert transformer block to help align text and video, and a 3D full attention module for capturing and creating spatially and temporally accurate videos.

-->
# CogVideoX

🤗 Diffusers framework is huggface's open source solution related to diffusion model. Through module tools, it can be conveniently and quickly integrated with custom frameworks. In the direction of model training, Diffusers has accelerate acceleration support and is compatible with common reasoning frameworks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace this paragraph with the suggestion (or something like that) from using-diffusers/cogvideox.md since users coming to Diffusers are probably already familiar with it. They want to know more about CogVideoX :)

@yiyixuxu
Copy link
Collaborator

@stevhliu is this good to merge now?

@stevhliu
Copy link
Member

Yeah looks good now. Thanks for iterating and improving on the docs @glide-the! 🤗

@yiyixuxu yiyixuxu merged commit 0d935df into huggingface:main Oct 16, 2024
1 check passed
sayakpaul pushed a commit that referenced this pull request Dec 23, 2024
* CogVideoX docs


---------

Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: YiYi Xu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants